MLOps Project: Environment Setup¶

Introduction¶

In this notebook, the development environment will be set up and all necessary tools for the project Binary Classification of Income (over/under 50,000) using the "Adult income dataset" will be installed.

Lernziele¶

After completing this notebook, the following will be achieved:

  • Have a working Python environment with all required packages
  • Understand the basic project structure
  • Have initialized a Git repository
  • Have verified the functionality of all MLOps tools

1. Environment setup¶

1.1 Create a virtual environment¶

These commands will be executed in the terminal:

python -m venv mlops-venv
# Unter Windows
.\mlops-venv\Scripts\activate
# Unter Unix/MacOS
source mlops-venv/bin/activate

1.2 Install dependencies¶

The following packages are installed:

In [49]:
# cell 1: Install the required packages
!pip install numpy pandas scikit-learn mlflow pytest fastapi uvicorn great-expectations docker python-dotenv matplotlib seaborn
Defaulting to user installation because normal site-packages is not writeable
Requirement already satisfied: numpy in c:\programdata\anaconda3\lib\site-packages (1.26.4)
Requirement already satisfied: pandas in c:\users\tanli\appdata\roaming\python\python312\site-packages (2.1.4)
Requirement already satisfied: scikit-learn in c:\programdata\anaconda3\lib\site-packages (1.5.1)
Requirement already satisfied: mlflow in c:\users\tanli\appdata\roaming\python\python312\site-packages (2.20.2)
Requirement already satisfied: pytest in c:\programdata\anaconda3\lib\site-packages (7.4.4)
Requirement already satisfied: fastapi in c:\users\tanli\appdata\roaming\python\python312\site-packages (0.115.8)
Requirement already satisfied: uvicorn in c:\users\tanli\appdata\roaming\python\python312\site-packages (0.34.0)
Requirement already satisfied: great-expectations in c:\users\tanli\appdata\roaming\python\python312\site-packages (1.3.6)
Requirement already satisfied: docker in c:\users\tanli\appdata\roaming\python\python312\site-packages (7.1.0)
Requirement already satisfied: python-dotenv in c:\programdata\anaconda3\lib\site-packages (0.21.0)
Requirement already satisfied: matplotlib in c:\programdata\anaconda3\lib\site-packages (3.9.2)
Requirement already satisfied: seaborn in c:\programdata\anaconda3\lib\site-packages (0.13.2)
Requirement already satisfied: python-dateutil>=2.8.2 in c:\users\tanli\appdata\roaming\python\python312\site-packages (from pandas) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in c:\programdata\anaconda3\lib\site-packages (from pandas) (2024.1)
Requirement already satisfied: tzdata>=2022.1 in c:\programdata\anaconda3\lib\site-packages (from pandas) (2023.3)
Requirement already satisfied: scipy>=1.6.0 in c:\programdata\anaconda3\lib\site-packages (from scikit-learn) (1.13.1)
Requirement already satisfied: joblib>=1.2.0 in c:\programdata\anaconda3\lib\site-packages (from scikit-learn) (1.4.2)
Requirement already satisfied: threadpoolctl>=3.1.0 in c:\programdata\anaconda3\lib\site-packages (from scikit-learn) (3.5.0)
Requirement already satisfied: mlflow-skinny==2.20.2 in c:\users\tanli\appdata\roaming\python\python312\site-packages (from mlflow) (2.20.2)
Requirement already satisfied: Flask<4 in c:\programdata\anaconda3\lib\site-packages (from mlflow) (3.0.3)
Requirement already satisfied: Jinja2<4,>=3.0 in c:\programdata\anaconda3\lib\site-packages (from mlflow) (3.1.4)
Requirement already satisfied: alembic!=1.10.0,<2 in c:\users\tanli\appdata\roaming\python\python312\site-packages (from mlflow) (1.14.1)
Requirement already satisfied: graphene<4 in c:\users\tanli\appdata\roaming\python\python312\site-packages (from mlflow) (3.4.3)
Requirement already satisfied: markdown<4,>=3.3 in c:\programdata\anaconda3\lib\site-packages (from mlflow) (3.4.1)
Requirement already satisfied: pyarrow<19,>=4.0.0 in c:\programdata\anaconda3\lib\site-packages (from mlflow) (16.1.0)
Requirement already satisfied: sqlalchemy<3,>=1.4.0 in c:\programdata\anaconda3\lib\site-packages (from mlflow) (2.0.34)
Requirement already satisfied: waitress<4 in c:\users\tanli\appdata\roaming\python\python312\site-packages (from mlflow) (3.0.2)
Requirement already satisfied: cachetools<6,>=5.0.0 in c:\programdata\anaconda3\lib\site-packages (from mlflow-skinny==2.20.2->mlflow) (5.3.3)
Requirement already satisfied: click<9,>=7.0 in c:\programdata\anaconda3\lib\site-packages (from mlflow-skinny==2.20.2->mlflow) (8.1.7)
Requirement already satisfied: cloudpickle<4 in c:\programdata\anaconda3\lib\site-packages (from mlflow-skinny==2.20.2->mlflow) (3.0.0)
Requirement already satisfied: databricks-sdk<1,>=0.20.0 in c:\users\tanli\appdata\roaming\python\python312\site-packages (from mlflow-skinny==2.20.2->mlflow) (0.44.1)
Requirement already satisfied: gitpython<4,>=3.1.9 in c:\programdata\anaconda3\lib\site-packages (from mlflow-skinny==2.20.2->mlflow) (3.1.43)
Requirement already satisfied: importlib_metadata!=4.7.0,<9,>=3.7.0 in c:\programdata\anaconda3\lib\site-packages (from mlflow-skinny==2.20.2->mlflow) (7.0.1)
Requirement already satisfied: opentelemetry-api<3,>=1.9.0 in c:\users\tanli\appdata\roaming\python\python312\site-packages (from mlflow-skinny==2.20.2->mlflow) (1.30.0)
Requirement already satisfied: opentelemetry-sdk<3,>=1.9.0 in c:\users\tanli\appdata\roaming\python\python312\site-packages (from mlflow-skinny==2.20.2->mlflow) (1.30.0)
Requirement already satisfied: packaging<25 in c:\users\tanli\appdata\roaming\python\python312\site-packages (from mlflow-skinny==2.20.2->mlflow) (23.2)
Requirement already satisfied: protobuf<6,>=3.12.0 in c:\programdata\anaconda3\lib\site-packages (from mlflow-skinny==2.20.2->mlflow) (4.25.3)
Requirement already satisfied: pydantic<3,>=1.10.8 in c:\programdata\anaconda3\lib\site-packages (from mlflow-skinny==2.20.2->mlflow) (2.8.2)
Requirement already satisfied: pyyaml<7,>=5.1 in c:\programdata\anaconda3\lib\site-packages (from mlflow-skinny==2.20.2->mlflow) (6.0.1)
Requirement already satisfied: requests<3,>=2.17.3 in c:\programdata\anaconda3\lib\site-packages (from mlflow-skinny==2.20.2->mlflow) (2.32.3)
Requirement already satisfied: sqlparse<1,>=0.4.0 in c:\users\tanli\appdata\roaming\python\python312\site-packages (from mlflow-skinny==2.20.2->mlflow) (0.5.3)
Requirement already satisfied: typing-extensions<5,>=4.0.0 in c:\programdata\anaconda3\lib\site-packages (from mlflow-skinny==2.20.2->mlflow) (4.11.0)
Requirement already satisfied: iniconfig in c:\programdata\anaconda3\lib\site-packages (from pytest) (1.1.1)
Requirement already satisfied: pluggy<2.0,>=0.12 in c:\programdata\anaconda3\lib\site-packages (from pytest) (1.0.0)
Requirement already satisfied: colorama in c:\users\tanli\appdata\roaming\python\python312\site-packages (from pytest) (0.4.6)
Requirement already satisfied: starlette<0.46.0,>=0.40.0 in c:\users\tanli\appdata\roaming\python\python312\site-packages (from fastapi) (0.45.3)
Requirement already satisfied: h11>=0.8 in c:\programdata\anaconda3\lib\site-packages (from uvicorn) (0.14.0)
Requirement already satisfied: altair<5.0.0,>=4.2.1 in c:\users\tanli\appdata\roaming\python\python312\site-packages (from great-expectations) (4.2.2)
Requirement already satisfied: cryptography>=3.2 in c:\programdata\anaconda3\lib\site-packages (from great-expectations) (43.0.0)
Requirement already satisfied: jsonschema>=2.5.1 in c:\programdata\anaconda3\lib\site-packages (from great-expectations) (4.23.0)
Requirement already satisfied: marshmallow<4.0.0,>=3.7.1 in c:\users\tanli\appdata\roaming\python\python312\site-packages (from great-expectations) (3.26.1)
Requirement already satisfied: mistune>=0.8.4 in c:\programdata\anaconda3\lib\site-packages (from great-expectations) (2.0.4)
Requirement already satisfied: posthog<4,>3 in c:\users\tanli\appdata\roaming\python\python312\site-packages (from great-expectations) (3.13.0)
Requirement already satisfied: pyparsing>=2.4 in c:\programdata\anaconda3\lib\site-packages (from great-expectations) (3.1.2)
Requirement already satisfied: ruamel.yaml>=0.16 in c:\programdata\anaconda3\lib\site-packages (from great-expectations) (0.18.6)
Requirement already satisfied: tqdm>=4.59.0 in c:\programdata\anaconda3\lib\site-packages (from great-expectations) (4.66.5)
Requirement already satisfied: tzlocal>=1.2 in c:\users\tanli\appdata\roaming\python\python312\site-packages (from great-expectations) (5.3)
Requirement already satisfied: pywin32>=304 in c:\users\tanli\appdata\roaming\python\python312\site-packages (from docker) (306)
Requirement already satisfied: urllib3>=1.26.0 in c:\programdata\anaconda3\lib\site-packages (from docker) (2.2.3)
Requirement already satisfied: contourpy>=1.0.1 in c:\programdata\anaconda3\lib\site-packages (from matplotlib) (1.2.0)
Requirement already satisfied: cycler>=0.10 in c:\programdata\anaconda3\lib\site-packages (from matplotlib) (0.11.0)
Requirement already satisfied: fonttools>=4.22.0 in c:\programdata\anaconda3\lib\site-packages (from matplotlib) (4.51.0)
Requirement already satisfied: kiwisolver>=1.3.1 in c:\programdata\anaconda3\lib\site-packages (from matplotlib) (1.4.4)
Requirement already satisfied: pillow>=8 in c:\programdata\anaconda3\lib\site-packages (from matplotlib) (10.4.0)
Requirement already satisfied: Mako in c:\users\tanli\appdata\roaming\python\python312\site-packages (from alembic!=1.10.0,<2->mlflow) (1.3.9)
Requirement already satisfied: entrypoints in c:\users\tanli\appdata\roaming\python\python312\site-packages (from altair<5.0.0,>=4.2.1->great-expectations) (0.4)
Requirement already satisfied: toolz in c:\programdata\anaconda3\lib\site-packages (from altair<5.0.0,>=4.2.1->great-expectations) (0.12.0)
Requirement already satisfied: cffi>=1.12 in c:\programdata\anaconda3\lib\site-packages (from cryptography>=3.2->great-expectations) (1.17.1)
Requirement already satisfied: Werkzeug>=3.0.0 in c:\programdata\anaconda3\lib\site-packages (from Flask<4->mlflow) (3.0.3)
Requirement already satisfied: itsdangerous>=2.1.2 in c:\programdata\anaconda3\lib\site-packages (from Flask<4->mlflow) (2.2.0)
Requirement already satisfied: blinker>=1.6.2 in c:\programdata\anaconda3\lib\site-packages (from Flask<4->mlflow) (1.6.2)
Requirement already satisfied: graphql-core<3.3,>=3.1 in c:\users\tanli\appdata\roaming\python\python312\site-packages (from graphene<4->mlflow) (3.2.6)
Requirement already satisfied: graphql-relay<3.3,>=3.1 in c:\users\tanli\appdata\roaming\python\python312\site-packages (from graphene<4->mlflow) (3.2.0)
Requirement already satisfied: MarkupSafe>=2.0 in c:\programdata\anaconda3\lib\site-packages (from Jinja2<4,>=3.0->mlflow) (2.1.3)
Requirement already satisfied: attrs>=22.2.0 in c:\programdata\anaconda3\lib\site-packages (from jsonschema>=2.5.1->great-expectations) (23.1.0)
Requirement already satisfied: jsonschema-specifications>=2023.03.6 in c:\programdata\anaconda3\lib\site-packages (from jsonschema>=2.5.1->great-expectations) (2023.7.1)
Requirement already satisfied: referencing>=0.28.4 in c:\programdata\anaconda3\lib\site-packages (from jsonschema>=2.5.1->great-expectations) (0.30.2)
Requirement already satisfied: rpds-py>=0.7.1 in c:\programdata\anaconda3\lib\site-packages (from jsonschema>=2.5.1->great-expectations) (0.10.6)
Requirement already satisfied: six>=1.5 in c:\users\tanli\appdata\roaming\python\python312\site-packages (from posthog<4,>3->great-expectations) (1.16.0)
Requirement already satisfied: monotonic>=1.5 in c:\users\tanli\appdata\roaming\python\python312\site-packages (from posthog<4,>3->great-expectations) (1.6)
Requirement already satisfied: backoff>=1.10.0 in c:\users\tanli\appdata\roaming\python\python312\site-packages (from posthog<4,>3->great-expectations) (2.2.1)
Requirement already satisfied: annotated-types>=0.4.0 in c:\programdata\anaconda3\lib\site-packages (from pydantic<3,>=1.10.8->mlflow-skinny==2.20.2->mlflow) (0.6.0)
Requirement already satisfied: pydantic-core==2.20.1 in c:\programdata\anaconda3\lib\site-packages (from pydantic<3,>=1.10.8->mlflow-skinny==2.20.2->mlflow) (2.20.1)
Requirement already satisfied: charset-normalizer<4,>=2 in c:\programdata\anaconda3\lib\site-packages (from requests<3,>=2.17.3->mlflow-skinny==2.20.2->mlflow) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in c:\programdata\anaconda3\lib\site-packages (from requests<3,>=2.17.3->mlflow-skinny==2.20.2->mlflow) (3.7)
Requirement already satisfied: certifi>=2017.4.17 in c:\programdata\anaconda3\lib\site-packages (from requests<3,>=2.17.3->mlflow-skinny==2.20.2->mlflow) (2024.8.30)
Requirement already satisfied: ruamel.yaml.clib>=0.2.7 in c:\programdata\anaconda3\lib\site-packages (from ruamel.yaml>=0.16->great-expectations) (0.2.8)
Requirement already satisfied: greenlet!=0.4.17 in c:\programdata\anaconda3\lib\site-packages (from sqlalchemy<3,>=1.4.0->mlflow) (3.0.1)
Requirement already satisfied: anyio<5,>=3.6.2 in c:\programdata\anaconda3\lib\site-packages (from starlette<0.46.0,>=0.40.0->fastapi) (4.2.0)
Requirement already satisfied: sniffio>=1.1 in c:\programdata\anaconda3\lib\site-packages (from anyio<5,>=3.6.2->starlette<0.46.0,>=0.40.0->fastapi) (1.3.0)
Requirement already satisfied: pycparser in c:\programdata\anaconda3\lib\site-packages (from cffi>=1.12->cryptography>=3.2->great-expectations) (2.21)
Requirement already satisfied: google-auth~=2.0 in c:\users\tanli\appdata\roaming\python\python312\site-packages (from databricks-sdk<1,>=0.20.0->mlflow-skinny==2.20.2->mlflow) (2.38.0)
Requirement already satisfied: gitdb<5,>=4.0.1 in c:\programdata\anaconda3\lib\site-packages (from gitpython<4,>=3.1.9->mlflow-skinny==2.20.2->mlflow) (4.0.7)
Requirement already satisfied: zipp>=0.5 in c:\programdata\anaconda3\lib\site-packages (from importlib_metadata!=4.7.0,<9,>=3.7.0->mlflow-skinny==2.20.2->mlflow) (3.17.0)
Requirement already satisfied: deprecated>=1.2.6 in c:\users\tanli\appdata\roaming\python\python312\site-packages (from opentelemetry-api<3,>=1.9.0->mlflow-skinny==2.20.2->mlflow) (1.2.18)
Requirement already satisfied: opentelemetry-semantic-conventions==0.51b0 in c:\users\tanli\appdata\roaming\python\python312\site-packages (from opentelemetry-sdk<3,>=1.9.0->mlflow-skinny==2.20.2->mlflow) (0.51b0)
Requirement already satisfied: wrapt<2,>=1.10 in c:\programdata\anaconda3\lib\site-packages (from deprecated>=1.2.6->opentelemetry-api<3,>=1.9.0->mlflow-skinny==2.20.2->mlflow) (1.14.1)
Requirement already satisfied: smmap<5,>=3.0.1 in c:\programdata\anaconda3\lib\site-packages (from gitdb<5,>=4.0.1->gitpython<4,>=3.1.9->mlflow-skinny==2.20.2->mlflow) (4.0.0)
Requirement already satisfied: pyasn1-modules>=0.2.1 in c:\programdata\anaconda3\lib\site-packages (from google-auth~=2.0->databricks-sdk<1,>=0.20.0->mlflow-skinny==2.20.2->mlflow) (0.2.8)
Requirement already satisfied: rsa<5,>=3.1.4 in c:\users\tanli\appdata\roaming\python\python312\site-packages (from google-auth~=2.0->databricks-sdk<1,>=0.20.0->mlflow-skinny==2.20.2->mlflow) (4.9)
Requirement already satisfied: pyasn1<0.5.0,>=0.4.6 in c:\programdata\anaconda3\lib\site-packages (from pyasn1-modules>=0.2.1->google-auth~=2.0->databricks-sdk<1,>=0.20.0->mlflow-skinny==2.20.2->mlflow) (0.4.8)

1.3 Checking installation¶

To verify if all packages are correctly installed, you can use the following code snippet in your Jupyter notebook. This code will attempt to import the necessary packages and print a success message if all imports are successful:

In [3]:
# Zelle 2: Import und Versionscheck
import sys
import numpy as np
import pandas as pd
import mlflow
import great_expectations as ge
from fastapi import FastAPI
import pytest
from sklearn.impute import SimpleImputer

# Versionen ausgeben
print(f"Python Version: {sys.version}")
print(f"NumPy Version: {np.__version__}")
print(f"Pandas Version: {pd.__version__}")
print(f"MLflow Version: {mlflow.__version__}")
print(f"Great Expectations Version: {ge.__version__}")
Python Version: 3.12.7 | packaged by Anaconda, Inc. | (main, Oct  4 2024, 13:17:27) [MSC v.1929 64 bit (AMD64)]
NumPy Version: 1.26.4
Pandas Version: 2.1.4
MLflow Version: 2.20.2
Great Expectations Version: 1.3.6

2. Projekt structure¶

The project structure will be as follows:

adult_income/
├── data/
│   ├── raw/
│   └── processed/
├── notebooks/
│   ├── 00_Umgebung_Einrichtung.ipynb
│   ├── 01_Daten_Exploration.ipynb
│   ├── 02_Daten_Vorverarbeitung.ipynb
│   ├── 03_Modell_Engineering.ipynb
│   └── 04_Modell_Deployment.ipynb
├── src/
│   ├── data/
│   ├── features/
│   ├── models/
│   └── api/
├── tests/
├── .gitignore
├── README.md
└── requirements.txt

The structure is created:

In [4]:
# Zelle 3: Projektstruktur erstellen
import os

def create_project_structure():
    # Verzeichnisstruktur definieren
    directories = [
        'data/raw',
        'data/processed',
        'notebooks',
        'src/data',
        'src/features',
        'src/models',
        'src/api',
        'tests'
    ]
    
    # Verzeichnisse erstellen
    for dir_path in directories:
        os.makedirs(dir_path, exist_ok=True)
        print(f"Verzeichnis erstellt: {dir_path}")

create_project_structure()
Verzeichnis erstellt: data/raw
Verzeichnis erstellt: data/processed
Verzeichnis erstellt: notebooks
Verzeichnis erstellt: src/data
Verzeichnis erstellt: src/features
Verzeichnis erstellt: src/models
Verzeichnis erstellt: src/api
Verzeichnis erstellt: tests

3. Git setup¶

3.1 Git Repository initialising¶

The commands are executed in terminal:

git init

3.2 create .gitignore¶

.gitignore file is created:

In [7]:
# Zelle 4: .gitignore erstellen
gitignore_content = """
# Python
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
env/
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
*.egg-info/
.installed.cfg
*.egg

# Virtuelle Umgebung
mlops-venv/
venv/
ENV/

# Jupyter Notebook
.ipynb_checkpoints

# MLflow
mlruns/

# Daten
data/raw/*
data/processed/*
!data/raw/.gitkeep
!data/processed/.gitkeep

# IDE
.idea/
.vscode/
"""

with open('.gitignore', 'w') as f:
    f.write(gitignore_content)
print(".gitignore Datei wurde erstellt")
.gitignore Datei wurde erstellt

4. Download dataset¶

The dataset Adult Income is downloaded from here: https://www.kaggle.com/datasets/wenruliu/adult-income-dataset/data

In [5]:
#https://raw.githubusercontent.com/vladislabv/fhswf-mlops-project/refs/heads/1_Datenverarbeitung_lip/adult.csv

# Zelle 5: Datensatz herunterladen
import pandas as pd

url = "https://raw.githubusercontent.com/vladislabv/fhswf-mlops-project/refs/heads/1_Datenverarbeitung_lip/adult.csv"
df = pd.read_csv(url)
df.to_csv('data/raw/adult-income.csv', index=False)
print("The dataset was downloaded and saved in data/raw/adult-income.csv")
The dataset was downloaded and saved in data/raw/adult-income.csv

The following columns are given in the dataset:

Age, workclass, fnlwgt, education, educational-num, marital-status, occupation, relationship, race, gender, capital-gain, capital-loss, hours-per-week, native-country, income

In [6]:
# First, we will get an overview about the dataset

print(df.info())
print(df.head())
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 48842 entries, 0 to 48841
Data columns (total 15 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   age              48842 non-null  int64 
 1   workclass        48842 non-null  object
 2   fnlwgt           48842 non-null  int64 
 3   education        48842 non-null  object
 4   educational-num  48842 non-null  int64 
 5   marital-status   48842 non-null  object
 6   occupation       48842 non-null  object
 7   relationship     48842 non-null  object
 8   race             48842 non-null  object
 9   gender           48842 non-null  object
 10  capital-gain     48842 non-null  int64 
 11  capital-loss     48842 non-null  int64 
 12  hours-per-week   48842 non-null  int64 
 13  native-country   48842 non-null  object
 14  income           48842 non-null  object
dtypes: int64(6), object(9)
memory usage: 5.6+ MB
None
   age  workclass  fnlwgt     education  educational-num      marital-status  \
0   25    Private  226802          11th                7       Never-married   
1   38    Private   89814       HS-grad                9  Married-civ-spouse   
2   28  Local-gov  336951    Assoc-acdm               12  Married-civ-spouse   
3   44    Private  160323  Some-college               10  Married-civ-spouse   
4   18          ?  103497  Some-college               10       Never-married   

          occupation relationship   race  gender  capital-gain  capital-loss  \
0  Machine-op-inspct    Own-child  Black    Male             0             0   
1    Farming-fishing      Husband  White    Male             0             0   
2    Protective-serv      Husband  White    Male             0             0   
3  Machine-op-inspct      Husband  Black    Male          7688             0   
4                  ?    Own-child  White  Female             0             0   

   hours-per-week native-country income  
0              40  United-States  <=50K  
1              50  United-States  <=50K  
2              40  United-States   >50K  
3              40  United-States   >50K  
4              30  United-States  <=50K  

There are values missing. The missing values are marked with '?' in the dataset. The parameters used in the dataset for adult income prediction are:

  • age: the age of an individual
  • workclass: a general term to represent the employment status of an individual
  • fnlwgt: final weight. This is the number of people the census believes the entry represents..
  • education: the highest level of education achieved by an individual.
  • education-num: the highest level of education achieved in numerical form.
  • marital-status: marital status of an individual.
  • occupation: the general type of occupation of an individual
  • relationship: represents what this individual is relative to others.
  • race: Descriptions of an individual’s race
  • sex: the sex of the individual
  • capital-gain: capital gains for an individual
  • capital-loss: capital loss for an individual
  • hours per week: the hours an individual has reported to work per week
  • native country: country of origin for an individual

In the next step, we will start with data exploration.

In [ ]: